Project: Investigating factors that affect show up for medical appointment in Brazil using 'No-show-appointments' Dataset

Table of Contents

Introduction

The dataset

The dataset can be obtained from here: https://www.kaggle.com/datasets/joniarroba/noshowappointments

The dataset has 110,527 medical appointments with 14 associated fatures giving information on the medical appointments. The patient's show-up or no-show up to the appointment is the target variable.

Posible Questions to be answered from the analysis:

  1. Which gender showed up most for the appointment
  2. What days/months/year were the appointments made and what time? Could this affect show up for the appointment.
  3. What age showed up most for the appointment?
  4. Did neigbourhood affect showing up for the appointment?
  5. Did scholarship affect show up?
  6. What disease is associateed with high number of show ups?
  7. Does receiving sms impact the likelihood of show up?

Data Wrangling

In this section we will:

  1. Load the data into a pandas dataframe.
  2. Perform basic checks on the data for cleanliness.
  3. Trim and clean the dataset for analysis.

General Properties

Step 1. Load data.

Step 2: Basic Checks.

Insights from basic check.

The dataset given on medical appointments:

Data Cleaning

In this section we will deal with cleaning the following:

Exploratory Data

In this section we will compute statistics and create visualizations as we seek to answer the questions at the introduction part.

Research Question 1: Which gender showed up most for the appointment?

Insights from the Gender column

Research Question 2: Did the timing of the appointment affect the show up?

Insights from Time columns

Research Question 3: What age showed up for the appointment?

Insights from age

Research Question 4: Did neigbourhood affect showing up for the appointment?

Insights from the Neighbourhood column

Research Question 5: Did scholarship affect show up

Insights from the Scholarship column

Research Question 6: What disease is associated with high number of show ups

a) Hypertension

Insights from hypertension column

b) Diabetes

Insights from Diabetes Column

Research Question 7: Does alcoholism affect the way people show up for the medical appoitments?

Insights from the Alcohol column

Research Question 8: Does receiving sms impact the likelihood of show up?

Insights from the SMS_received column

Conclusions

Although we have more females than males showing up for the appointments, the trend for showing up seems the same between female and male that is about 80% did show up, but about 20% didn't show up across the gender.

It appears that many appointments were scheduled for the age 0-1. This could be because of children have progams such as vaccinations that require them to visit regularly. Again, the age at about 50 there seems to be more people with appointments and especially Female gender, perharps because as people age, they are prone to diseases due to many factors such as hormonal changes and many more.

We have also realized that we have more female with lifestyle diseases like diabetes and hypertension as compared to men and that those who suffer from lifestyle diseases like pressure and diabetes are more likely to show up for the appointment

It has been noted that we have more men who take alcohol, as compared to female. Those who take alcohol have more scholarships perharps because they are prone to poverty issues that Bolsa Familia addresses.

It has also been noted that features such as scholarship and sms received did not contribute significantly to the show up column.

The only limitation I encountered is the lack of proper demographics for the population in this dataset. For instance we could have performed a better analysis based on the location of the people to compare if staying close to the facility (Neighbourhood) would affect the show up or not. Again knowing wether the client is residing in a city or rural set up combined with the information whether neighbourhood is a city or a rural facity would have been of much help for us to know how residence affects show up. literacy level would also have greatly impacted the analysis.